Skip to content

Conversation

@gty404
Copy link
Contributor

@gty404 gty404 commented Dec 24, 2025

No description provided.

const std::vector<int32_t>& schema_ids() const { return schema_ids_; }
const std::unordered_set<int32_t>& schema_ids() const { return schema_ids_; }

void ApplyTo(TableMetadataBuilder& builder) const override;
Copy link
Contributor

@HeartLinked HeartLinked Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about implement applyTo here? Although the table update doesn’t seem to be invoked in the current architecture and may need refactoring , I think it makes sense to implement it here.

Comment on lines +161 to +164
auto name = std::string(partition_field.name());
if (name.empty()) {
return InvalidArgument("Cannot use empty partition name: {}", name);
}
Copy link
Contributor

@WZhuo WZhuo Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto name = std::string(partition_field.name());
if (name.empty()) {
return InvalidArgument("Cannot use empty partition name: {}", name);
}
ICEBERG_PRECHECK(!partition_field.name().empty(), "Cannot use empty partition name: {}", name);
auto name = std::string(partition_field.name());

#include "iceberg/util/macros.h"
#include "iceberg/util/type_util.h"
#include "iceberg/util/visit_type.h"
#include "table_metadata.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "table_metadata.h"
#include "iceberg/table_metadata.h"

Comment on lines +239 to +241
auto max_it = std::ranges::max_element(
id_to_field.get(),
[](const auto& lhs, const auto& rhs) { return lhs.first < rhs.first; });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapper with a Lazy or a simple std::call_once? Avoid search every time.

return max_it->first;
}

bool Schema::SameSchema(const Schema& other) const { return fields_ == other.fields_; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need compare identifier_fields

Comment on lines +692 to +694
if (!schema_ids_to_remove.contains(current_schema_id)) {
return InvalidArgument("Cannot remove current schema: {}", current_schema_id);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!schema_ids_to_remove.contains(current_schema_id)) {
return InvalidArgument("Cannot remove current schema: {}", current_schema_id);
}
ICEBERG_PRECHECK(!schema_ids_to_remove.contains(current_schema_id), "Cannot remove current schema: {}", current_schema_id);

Comment on lines +724 to +731
std::ranges::find_if(changes_, [new_schema_id](const auto& change) {
if (change->kind() != TableUpdate::Kind::kAddSchema) {
return false;
}
auto* add_schema = dynamic_cast<table::AddSchema*>(change.get());
return add_schema->schema()->schema_id().value_or(Schema::kInitialSchemaId) ==
new_schema_id;
}) != changes_.cend();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::ranges::find_if(changes_, [new_schema_id](const auto& change) {
if (change->kind() != TableUpdate::Kind::kAddSchema) {
return false;
}
auto* add_schema = dynamic_cast<table::AddSchema*>(change.get());
return add_schema->schema()->schema_id().value_or(Schema::kInitialSchemaId) ==
new_schema_id;
}) != changes_.cend();
std::ranges::any_of(changes_, [new_schema_id](const auto& change) {
if (change->kind() != TableUpdate::Kind::kAddSchema) {
return false;
}
auto* add_schema = dynamic_cast<table::AddSchema*>(change.get());
return add_schema->schema()->schema_id().value_or(Schema::kInitialSchemaId) ==
new_schema_id;
});

Comment on lines +737 to +739
auto new_schema = std::make_shared<Schema>(
std::vector<SchemaField>(schema.fields().begin(), schema.fields().end()),
new_schema_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lack of identifier_field_ids

Suggested change
auto new_schema = std::make_shared<Schema>(
std::vector<SchemaField>(schema.fields().begin(), schema.fields().end()),
new_schema_id);
auto new_schema = std::make_shared<Schema>(
schema.fields() | std::ranges::to<std::vector>(),
new_schema_id, schema.IdentifierFieldIds());

ICEBERG_RETURN_UNEXPECTED(schema.Validate(metadata_.format_version));

auto new_schema_id = ReuseOrCreateNewSchemaId(schema);
if (schemas_by_id_.find(new_schema_id) != schemas_by_id_.end()) {
Copy link
Contributor

@WZhuo WZhuo Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also check the metadata_.last_column_id equals to new_last_column_id in Java implement, does it matter?

Comment on lines +937 to +938
ICEBERG_BUILDER_ASSIGN_OR_RETURN(auto schema_id,
impl_->AddSchema(*schema, new_last_column_id));
Copy link
Contributor

@WZhuo WZhuo Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ICEBERG_BUILDER_ASSIGN_OR_RETURN(auto schema_id,
impl_->AddSchema(*schema, new_last_column_id));
ICEBERG_BUILDER_RETURN_IF_ERROR(impl_->AddSchema(*schema, new_last_column_id));

schema_id is not used

static constexpr int64_t kInvalidSequenceNumber = -1;
static constexpr int64_t kInitialRowId = 0;

static inline const std::unordered_map<TypeId, int8_t> kMinFormatVersions = {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not inited?

Status TableMetadataBuilder::Impl::SetCurrentSchema(int32_t schema_id) {
if (schema_id == kLastAdded) {
if (!last_added_schema_id_.has_value()) {
return InvalidArgument("Cannot set last added schema: no schema has been added");
Copy link
Contributor

@WZhuo WZhuo Dec 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return InvalidArgument("Cannot set last added schema: no schema has been added");
return ValidationFailed("Cannot set last added schema: no schema has been added");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants